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1, INTRODUCTION 

High-end With the VLSI technology scaling down, an increasing number of devices are getting 
packed in the silicon real estate. Unprecedented levels of integration of devices and functionality has been 
achieved over the past few technology nodes. As the technology is scaling, the on-chip interconnect wires are 
also increasingly becoming important and the overall system performance is being increasingly dominated by 
the on-chip interconnects [1]. Interconnects are becoming important than the actual devices in terms of 
power, delay, the figures of merit during the selection and Implementation of a device in chip fabrication and 
packing density. This process can be used in numerous electronic applications like data communication 
networks, digital signal processing, automobiles, multimedia and medical applications due to its high speed, 
reliability and capability of size reduction in various electronic components. 

With the modernization in VLSI technique, the use of number of transistors per chip rapidly 
increasing. The enhancement in transistor count produces high complexity in interconnects hence the 
increment in the interconnect length which led to enhancement in die size. Therefore, some silicon die which 
span across the length of global interconnects do not scale precisely. This phenomenon known as reverse 
scaling interconnects which led to power dissipation. Therefore, Interconnects greatly affect the chip 
performance and robustness. The power consumption due to interconnects can be as high as 40% of total 
power dissipation in high performance gate arrays [2]. 

There are numerous techniques come in existence in recent years such as Current-mode signaling 
[2-3], signal modulation [4], transmission line interconnects [5], amplitude pre-emphasis techniques [6-7] to 
reduce power dissipation, propagation delay and increase chip design efficiency. In [8], [9] and [10] a simple 
capacitive driven wire model is applied in time domain and frequency domain respectively to enhance 
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bandwidth. An effective way to reduce power dissipation and to increase the efficiency on interconnects is to 
reduce the voltage swing of the interconnect lines [11], [12].However, this approach induces power 
consumption and delay due to presence of voltage convertors [13].Another simple way to enhance 
interconnects bandwidth is Repeater insertion [14].However, Repeaters dissipate large amount of power and 
need large area [1, 15]. There are few more approaches namely modulated signaling [4] and pulsed current- 
mode signaling [1] attain high amount of latency. However, due to reduced voltage swing these techniques 
are highly sensitive to parameter alterations, undergoes low noise margin, and consists of highly complex 
design, which makes them difficult to be used in the industry. Cross talk, substrate coupling, interconnect 
delay, transmission line effects, power supply integrity are the effects generated due to low signal reliability 
and integrity. 

There are some essential factors in VLSI which determine the overall efficiency, cost, feasibility and 
reliability of the design namely power dissipation, Propagation delay and power delay product. The power 
consumption of interconnects can be optimized by controlling power in drivers, interconnect segments, 
receivers and repeaters. The propagation delay can be eliminated by placing the repeaters or accelerators 
along the global RC wires and helps to speed up the design process. The energy consumption of the 
interconnect design model can be specified by power-delay product. The efficient use of all these 
interconnects reduces power dissipation and propagation delay and enhances the overall performance of 
system. Therefore, there is a need of a technique, which reduces power dissipation, delay and power-delay- 
product to provide high efficiency and larger design throughput. Therefore, here we have introduced Power 
Efficient Clock Distribution Technique for Switched Capacitor DC-DC Converters. 

Hence, in our model, a voltage regulation technique used in complex integrated circuit designs, 
which is attained at individual modules. Hence, a clock distribution scheme is required for switching in dc-dc 
converters. These clocks are routed to different converter modules from a common source on the IC. This 
clock distribution scheme needs specific amount of energy in interconnect wires with their linked circuitries. 
This approach significantly enhances power savings, since the last stage of a clock network has low 
switching capacitance. In our model basic idea of limiting the voltage swing along interconnect adopted to 
improve performance. Our model provide better voltage swing and reduces noise impedance to a large extent 
clock distribution technique 

In this work, we also focus on bus architecture, its design, types, working and operations in VLSI 
design. The power dissipation in busses are up to 40% of the total power consumption in VLSI chip. 
Therefore, here a new architecture, which reduces the power consumption in the interconnection busses into 
the chip reducing the voltage swing using our proposed method, is presented which can be very helpful in 
future industrial applications. Here, a comprehensive knowledge of available bus drivers, bus receivers and 
their drawbacks are discussed to eliminate power consumption. This bus design architectures can be further 
used in Nano-technology and VLSI spice parameters frequency in various VLSI applications over a next 
decade. 

This paper is organized as follows. Section Hl presents the test platform for evaluating the 
performance of the design topologies. Section HI presents the two design topologies for the signaling 
schemes. Section IV presents the results and analyses the performance of the two schemes. Section V finally 
presents the conclusion. 


2. RELATED WORK 

In recent years, the design of interconnects in VLSI architecture has taken immense popularity and it 
is one of the fastest growing research topic of modern era due to its numerous applications in medical, 
multimedia and electronic devices like computer, mobiles, TV etc. Due to increasing popularity of micro and 
Nano-electronic devices, there is a need of shrinking the size of interconnects to make VLSI chips smaller 
while maintaining minimum power consumption and delay. To eliminate power dissipation and delay in 
various electronic devices a low voltage swing technique is a promising option for VLSI applications. This 
technique can deduct computational complexities largely which present in high voltage swing techniques. 
Hence, it offers help in shrinking of interconnects size in different VLSI applications by eliminating noise. 
Designing a low swing signaling schemes for driving long interconnects using driver-receiver end 
architecture is a very cumbersome task. Many researchers has done some significant amount of research in 
this field, which is as follows: 

In [16], a novel mixed technique introduced by combining a low swing and regenerator technique to 
eliminate power consumption and delay and to increase feasibility of the system in global interconnects of 
chip. However, noise margin is very high in this technique, which led to high power consumption. In [17], a 
novel LSDFF cell and a novel swing and slew-aware CTS algorithm has been presented to eliminate a 
significant amount of noise and power consumption based on clock networks. However, it generates high 
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insertion delay and clock skew, which led to degradation in VLSI chip performance. In [18], robust magnetic 
skyrmion low power sensors are used to reduce the power consumption in global interconnects. This 
technique can provide high-energy efficiency and reduces high computational complexity. In [19], to 
optimize total delay and energy consumption a buffered ST sensor based voltage swing technique adopted. 
This technique also reduces noise immunity. However, high computational complexity can affect the overall 
performance of the system. In [20], a LMS adaptive filter based distributed arithmetic architecture adopted to 
reduce computational complexity and interconnect performance in critical paths. However, memory 
requirement in distributed arithmetic architecture is very high. In [21], a hardware oriented architecture 
adopted based on fuzzy logics, block truncation coding and digital half-toning to reduce computational 
complexity and memory requirement for VLSI implementation. However, compression ratio using this 
technique is very low. In [22], a 3-D NoC-bus hybrid architecture presented to make routing more efficient in 
bus architectures of VLSI design through 3-D IC technology hence the power consumption and this system 
also reduces the computational cost. However, due to high interconnect traffic the thermal issues can degrade 
performance of the system. In [23], a wireless 3-D Network-on-chips (NoC) technique adopted based on 
inductive coupling to improve the performance of bus architectures. However, the number of chips and 
power supply becomes limited in this model. 

In this paper, we have introduced a robust Power Efficient Clock Distribution Technique for 
Switched Capacitor DC-DC Converter architecture, which helps to eliminate the drawbacks of existing 
algorithms. The architecture of our design build in such a way that power consumption and delay becomes 
minimum between interconnects for VLSI implementation. A clock distribution scheme utilized to switch 
power between DC-DC convertors without compromising performance of the system. In our model basic 
idea of limiting the voltage swing along interconnect adopted to improve performance. Our model provide 
better voltage swing and reduces noise impedance to a large extent clock distribution technique. In addition, a 
new architecture, which reduces the power consumption in the interconnection busses into the chip reducing 
the voltage swing using our proposed method, is presented which can be very helpful in future industrial 
applications. 


3. TEST ARCHITECTURE 

The proposed driver- receiver circuits are tested using the test architecture as presented in Figure 1 
and Figure 2 which models the interconnect length and the parasitic capacitance from interconnect to ground. 
Interconnect which is fabricated in Metal-3 layer can be varying length up to —mm. It is modeled as a 13 
distributed resistance-capacitance model. All circuits are simulated with a receiver output load capacitance of 
25fF. The fan out is modeled as extra distributed capacitive load of around 250fF/mm of interconnect length. 
All the circuits are analyzed under identical loading conditions, power supply and gate-source voltage. The 
test conditions are listed in Table 1. 
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Figure 1. Test Architecture 
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Figure 2. Interconnect Model 


Table 1. Test Conditions 


Parameter Symbol Value 
Power Supply Vddh 1.0V 
Gate Source Voltage Vgs 0.54V 

Loading Condition CL 250fF/mm 
CLout 25fF 


4. DESIGN TOPOLOGIES 

Due This section describes two design schemes for the power efficient routing of clock signals for 
the switched capacitor dc-dc converters. This section also discusses the operation of these two design 
topologies. 


4.1. Functioning of scheme-1 

The scheme | mainly works on the idea of limiting the voltage swing along interconnect to improve 
performance. The necessity of external power supply of reference voltage is avoided by using this limitation. 
The voltage swing limit is given by Equation 1. 


~Vin = Vs = (Vaa - |~Vipl) (1) 
Equation 2 gives the energy saving ratio. 


Flow _ Vs ~ Vaa-|~Vtp|-Vin (2) 


Etot Vdd Vad 


The schematic diagram of driver-receiver configuration is shown in Figure 3. The driver and 
receiver end circuits are detailed Figure 4 and Figure 5. The driver section of the proposed scheme operates 
in three different modes: Active, Diode connected and Source follower. In the fully active mode, the driver 
provides full drive capability to charge/ discharge the interconnect line. The voltage swing is limited by 
diode-connected mode of driver also offering lower impedance. Finally, the source follower mode provides 
better noise immunity. When interconnect is driven to opposite direction, transistor finally turns off. 

The overdrive beyond the switching limits improves the propagation delay and can be controlled by 
proper transistor sizing. The scheme gives higher drive strength for the same area as it has only one series 
transistor. In case the line is inactive for long periods, voltage level guards can be incorporated. 

When input is HIGH, transistorsM3, M4 and M6 are ON and M1 (N driver), M2, MS5and M7 are 
OFF. When the input transits from HIGH-LOW, M4, M3 and M8 (P driver) are turned OFF, while the gate 
of M1 is charged, through M5-M6, fully activating the output transistor (mode 1). As the interconnect line 
discharges towards ground, M7 which 1s active, turns M6 OFF and turns M2 ON. Gate of M1 “holds” the 
charge while the line is discharging till it is not low enough to activate M2.When M2 1s active, gate of M1 is 
driven to match the line (mode 2). Upon LOW-HIGH transition of input, the same events occur on the upper 
half of the circuit (M8 side). 

The receiver section is a simple inverter with an Enable signal. A balanced inverter is selected 
because of its simplicity and faster performance for conditions when the driven line crosses Vdd/2 at every 
transition. Long interconnect lines can lead to transistor mismatch in driver and receiver transistors. The 
enable signal in receiver then turns off the receiver avoiding any bias current when the line is not used. 

On the other end of the transmission line (driver end), a simple inverter with enable signal is used as 
in Figure 5. The enable signal can be used to turn-off the inverter when the interconnect line is not used. 
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Figure 4. Driver End Schematic for scheme-1 
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Figure 5. Receiver End Schematic for scheme 1 














4.2. Functioning of scheme-2 
The circuit schematic for the second design topology is shown in Figure 6. 
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Figure 6. Driver-receiver configuration Schematic diagram, Scheme-II 
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Figure 7. Driver end schematic, Scheme-II 





Figure 8. Receiver end schematic, Scheme-II 


Indonesian J Elec Eng & Comp Sci, Vol. 10, No. 1, April 2018 : 27 — 36 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 O 33 


The circuit on the receiver end of the interconnect line shown in Figure 8. The pass transistor M1 
provides isolation from previous stages. In the absence of this pass stage, the current from Vdd through M3 
will flow towards the previous stage. With the internal node isolated, M4 pulls up the gate of M3 at the input. 
M13, M14 and M15 transistors ensure reduction in the output pull-down transition time. The static power 
dissipation is eliminated by M11 and M12 transistors, which do not allow static current to ground when M2 
is not fully ON. Additional pull-up device in the output at the receiver side improves the low-to-high 
propagation delay. 


5. SIMULATION RESULT 


The simulation results obtained for Scheme-I and II are discussed in this section. The schemes are 
evaluated based on three performance metrics: delay, power consumption, power delay product. 
A comparative study of the two schemes is presented. 


5.1. Delay 

Figure 6 and Figure 7 shows the propagation delay versus interconnect length for Scheme-I and II. 
The interconnect length was varied from 1mm to 10mm. The propagation delay follows a linear relationship 
with the interconnect length while the same for Scheme-II is quadratic. The propagation delay is lesser for 
Scheme-II for the interconnect lengths investigated in this work. However, for longer interconnect lengths; 
the delay in Scheme-II can increase appreciably higher than that in Scheme-I because of the quadratic 
relationship. 
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Figure 9. Propagation delay versus interconnect length, Scheme-I 
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Figure 10. Propagation delay versus interconnect length, Scheme-II 
5.2. Power Consumption 
Figure 8 shows the power consumption for Scheme-I for varying interconnect length from 1mm to 


10mm. Figure 9 shows the same result for Scheme-II. Scheme-I performs better than Scheme-II by an order 
of magnitude. Power consumption in Scheme-I shows a steady increase with interconnect length. In Scheme- 
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II, the power rises and then drops around 8mm interconnect length. However, the minimum power 
consumption in Scheme-I is an order more than the maximum consumption in Scheme-II. 
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Figure 11. Power consumption versus interconnect length, Scheme-I 
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Figure 12. Power consumption versus interconnect length, Scheme-II 


5.3. Power-Delay Product 

Figure 13 10 shows the power-delay product for Scheme-I for varying interconnect length from 
Imm to 10mm. Figure 11 shows the same result for Scheme-II. As discussed in IV-A and IV-B, the delay 
performance is better for Scheme-II while power performance is better for Scheme-I. In such a case the 
power-delay product is a good performance criterion to evaluate the two Schemes. As is shown in Figure 10 
and 11, the power-delay product for Scheme-I is an order less than the Scheme-II. 

Figure 12 shows the delay performance for the two schemes for varying loads. Both the schemes 
show a steady increase in the delay with load as expected. 
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Figure 13. Power-delay product versus interconnect length, Scheme-I 
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Figure 14. Power-delay product versus interconnect length, Scheme-II 


18.0 


—s— Scheme-!l 
—+— Scheme-ll 


17.5 


17.0 


16.5 


(SU) ||-aWeyos 


Scheme-| (ns) 


16.0 


15.5 





O 200 400 600 800 1000 
Load Capacitance (fF) 


Figure 15. Propagation delay v/s Load capacitance 


6. CONCLUSION 

In this paper, two optimized low swing signaling schemes based on MOS current mode logic circuit 
for driving long interconnects in on-chip global interconnects is discussed. Driver- receiver pair architectures 
at the two ends of a interconnect line using scheme-1 and scheme-2 are discussed and simulated. It is 
demonstrated that the performance of proposed low-swing signaling schemes for interconnects outperforms 
other schemes in terms of propagation delay, power consumption and power-delay product. It is concluded 
that the Scheme-II shows better performance that Scheme-I in terms of propagation delay, while Scheme-I is 
better in terms of power consumption. Scheme-I performs better than Scheme-II in terms of the power-delay 
product criterion. Simulation results verify that this analysis can provide a valuable guideline for the 
interconnect driver design for various VLSI applications. Hence, as the target of present work is to develop a 
power efficient scheme for efficient signal routing through interconnects, it can be concluded that Scheme-I 
is a design of choice. 
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